Comparing Dissimilarity Measures for Symbolic Data Analysis

نویسندگان

  • Donato MALERBA
  • Floriana ESPOSITO
  • Vincenzo GIOVIALE
  • Valentina TAMMA
  • Donato Malerba
  • Floriana Esposito
  • Vincenzo Gioviale
  • Valentina Tamma
چکیده

Nowadays, data analysts are confronted with new challenges: they are asked to process data that go beyond the classical framework, as in the case of data concerning more or less homogeneous classes or groups of individuals (second-order objects) instead of single individuals (first-order objects). A typical situation is that of census data, which raise privacy issues in all governmental agencies that distribute them. To guarantee that data analysts cannot identify an individual or a single business establishment, data are made available in aggregate form. Data aggregations by census tracts or by enumeration districts are examples of second-order objects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dissimilarity measures for histogram-valued data and divisive clustering of symbolic objects

Contemporary datasets are becoming increasingly larger and more complex, while techniques to analyse them are becoming more and more inadequate. Thus, new methods are needed to handle these new types of data. This study introduces methods to cluster histogram-valued data. However, histogram-valued data are difficult to handle computationally because observations typically have a different numbe...

متن کامل

Clustering Symbolic Time-Series using L-tuples

Among the many dimensionality reduction methods for timeseries data, Symbolic Aggregate approXimation (SAX) is perhaps the most popular due to its simplicity and uniqueness. With SAX, time-series data can be represented as string sequences which enables the utilization of methods found in text mining and bioinformatics to enhance data mining tasks. We propose an application of L-tuples to impro...

متن کامل

A New Symbolic Dissimilarity Measure for Multivalued Data Type and Novel Dissimilarity Approximation Techniques

In this paper a new statistical measure for estimating the degree of dissimilarity between two symbolic objects whose features are multivalued symbolic data type is proposed. In addition two new simple representation techniques viz., interval type and magnitude type for the computed dissimilarity between the symbolic objects are introduced. The dissimilarity matrices obtained are not necessaril...

متن کامل

Analysis of Distribution Valued Dissimilarity Data

We deal with methods for analyzing complex structured data, especially, distribution valued data. Nowadays, there are many requests to analyze various types of data including spatial data, time series data, functional data and symbolic data. The idea of symbolic data analysis proposed by Diday covers a large range of data structures. We focus on distribution valued dissimilarity data and multid...

متن کامل

Comparing Dissimilarity Measures: A Case of Banking Ratios

The aim of this paper is twofold. Firstly, to discuss a clustering of a given set of the European banks into groups based on their performance during 1999–2013. Secondly, to compare different dissimilarity measures and to determine which of them suits best for clustering banking ratios. Six ratios that reveal profitability, efficiency, stability and loan portfolio quality of the banks were used...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001